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Abstract - Severe acute respiratory syndrome (SARS) is 
a serious form of pneumonia which results in acute 
respiratory distress and sometimes death. In this study, 
we applied the reverse vaccinology approach to 
determine the antigenic determinant sites present on the 
protein. The method incorporates the prediction of 
antigenic sites, solvent accessible region and secondary 
structure, B-cell epitope prediction, and the designing of 
antigenic determinant. The results of the study 
suggested that small envelope protein and orf8 protein 
could be potential candidates for vaccine designing. The 
high scoring antigenic peptides were designed and 
optimized. It is inferred that peptide of small envelope 
protein will make a very stable and effective vaccine 
targeting the E protein of the virus which is responsible 
for the spread of the virion. This study provides a 
strong and a potential optimized vaccine against SARS, 
which has high chances of success of immunization and 
higher probabilities of combating the dreadful disease. 
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I. INTRODUCTION 


SARS-CoV Tor2 shows a similarity to group 2 of 
coronaviruses. Its genome includes genes encoding 
two replicate polyproteins (RNA-dependant-RNA- 
polymerase, i.e., pp la and pp lab), encompassing 
two-thirds of the genome. In addition to the presence 
of replicase, it also contains, M (membrane) protein, 
S (spike) protein, E (envelop) protein, and N 
(nucleocapsid) protein. These proteins are conserved 
among all coronaviruses. Nine novel open-reading 
frames (ORF) 1.e., ORF3, 4, 7, 8, 9, 10, 11, 13, and 
14 are known to be present in the genome of SARS- 
CoV also encodes. Identification of ORF through 
sequence similarity search revealed that there are 


several proteins in SARS virion that are responsible 
for the infection. 


These proteins include replicase la, replicases 1b the 
matrix (M) protein, spike (S)_ protein the 
nucleocapsid (N) protein and the small envelope (E) 
protein. These coronaviruses are known to be 
enveloped and contain a positive strand RNA with a 
minimum of 4 structural proteins which are the M 
protein, the small E protein, the N protein and the S 
protein. These proteins are well known for their roles 
in virion budding and receptor binding. E protein is 
an envelope protein with 76 amino acids. It is an 
important membrane component’ of _ these 
viruses.Apart fromn its function it also has a role in 
corona virus virion life cycle. The major aim of 
this study was to develop an optimized peptide 
fragment containing neutralizing epitopes, using 
bioinformatics databases and tools, that could be 
used as a synthetic vaccine. For this purpose the 
“Tor2” strain of the virus was selected as it was 
the first and the most severe strain, which hit the 
Toronto city of Canada very badly. 
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FIG 1. LIFE-CYCLE OF SARS CORONAVIRUS. 
Il. MATERIALS AND METHODS 


Reverse Vaccinology approach 
Reverse Vaccinology is an important and accepted 
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databases and sequences of pathogen genome and 
various bioinformatics software tools has enabled the 
search for vaccine candidates as in-silico process 
now. Unlike the conventional identification of 
antigen, reverse vaccinology uses whole genome 
spectrum of most potential antigens. It makes it 
possible to obtain individual and protein groups of 
vaccine candidates that includes the antigens, which 
otherwise can be missed out because of poor 
expression in lab conditions or because of problems 
with the culturing of pathogen. 


Protein sequence analysis 

The whole proteome of SARScoronavirus Tor2 was 
downloaded from the JCVI-CMR database (J Craig 
Venter Institute-Comprehensive Microbial Resource) 
(http://cmr.jcvi.org/cgibin/CMR/shared/Genomes.cgi 
).The functional sequences of the proteins in the 
FASTA format were obtained from NCBI (National 
Centre for 
BiotechnologyInformation(www.ncbi.nlm.nih.gov). 
A total of 20 functional protein sequences were 
obtained including the structural and the non- 
structural proteins. After removing the hypothetical 
and putative sequences, the sequences were analyzed 
using the TFASTY tool (BLOSUM 45 ,62 and 80), to 
check for similarities with the Coding DNA 
Sequences(CDS) in the Homo sapiens. 


TFASTY is a local alignment tool used for 
comparing a probe protein sequence to a DNA 
sequence database , calculating similarities. It allows 
substitutions and frame shifts within a codon. 


The protein sequences which showed no similarity 
with the available library sequences were further 
analyzed using the BLAST tool (BLOSUM 45 ,62 
and 80), to check for any similarity with the 
proteome of H. sapiens. 


Protein BLAST tool, available at 
http://blast.ncbi.nlm.nih.gov/Blast.cgi, is a word 
based, local alignment algorithm, used to compare 
the input protein sequences with the protein sequence 
library. 


The protein sequences showing no_ significant 
similarity with the H. sapiens protein sequences were 
selected as vaccine candidates and were used for 
immunogenic analysis. 

Prediction of antigenicity 

The antigenic properties of the sequences were 
computed by the EMBOSS Antigenic .This tool helps 
in predicting the potentially antigenic regions of a 
protein sequence, using the semi-empirical method of 


Kolaskar and Tongaonkar ''! Analysis of data from 
experimentally determined antigenic sites has 
revealed that the hydrophobic residues Cys, Leu and 
Val, if occur on the surface of a protein, are more 
likely to be a part of antigenic sites. 


Antigenic peptides were also computed by the 
ANTIGENIC PEPTIDE PREDICTION 
(http://imed.med.ucm.es/Tools/antigenic.pl) The 
peptides showing common results for both the 
programs are selected for further analysis and 
structure prediction. 


Finding the location in solvent accessible region 

NetSurfP prediction server was used to predict the 
solvent accessible regions on the protein 1.e. the 
exposed residues on the surface. SAA was calculated. 


Parker Hydrophilicity prediction 

Based on the peptide retention times In HPLC a 
hydrophilic scale was designed. The HPLC was 
conducted on a reversed-phase column. To analyze 
the epitope region a window having seven residues 
was used.The corresponding value of the scale was 
introduced for each of the seven residues and the 
arithmetical mean of the seven residue value was 
assigned to the fourth, (i+3), residue in_ the 
segment."”! 


Prediction of Protein Secondary Structure 

The secondary structure of the antigenic peptide can 
be predicted using the SOPMA. The improved Self 
Optimized Prediction Method (SOPMA) correctly 
predicts 69.5% of amino acids for a three-state 
description of the secondary structure (alpha-helix, 
beta-sheet and coil) in a whole database containing 
126 chains of non-homologous (less than 25% 
identity) proteins. Joint prediction with SOPMA and 
a neural networks method (PHD) correctly predicts 
82.2% of residues for 74% of co-predicted amino 
acids. 


Designing and optimization of the vaccine 
candidate 

The candidate vaccine was designed and optimized 
by using SYBYL software. 


Hil. RESULTS AND DISCUSSION 


Selection of the vaccine candidate 
TFASTY searches a nucleic acid database, translated 


in all six frames, using a protein query sequence. 
TFASTY calculates similarity scores. TFASTY also 
incorporates additional code for the alignment of 
three frame translations with protein sequences, 
allowing for the incorporation of frame-shifts in the 


database sequence translations. Because of the extra 
translation step, TFASTY searches can take an 
exceptionally long time. Alternatives to TFASTY are 
TFASTA and TFASTX. TFASTY for allows frame 
shifts within codons, while TFASTX does not. 
TFASTA does not allow for frame shifting. 


The variance of the scores is calculated and used 
along with the regression line to determine the 
normalized score, z-opt. 


Karlin and Altschul (1990, PNAS 87:2264) indicate 
that these normalized local similarity scores can be 
described by the extreme value distribution. 
Therefore, the probability of finding normalized 
scores greater than any observed z-opt, P(z > x), can 
be determined. From this, the expected number of 
sequences having a z-opt greater than any observed 
value, E(z >= x), can be calculated by P*D, where 
"D" is the number of sequences in the database. 


Out of a total of 20 proteins of the proteome of SARS 
CoV Tor2 strain, small envelope protein(orf5) 
showed no similarity with the known Homo sapiens 
CDS in TFASTY. Further in BLAST, the protein 
showed no significant similarity with the proteome of 
Homo sapiens. Hence the protein were selected as the 
potential vaccine candidate against SARS CoV Tor2 
strain. 


Prediction of Antigenic Peptides 

It is known that specific epitopes on a protein and not 
the whole protein are responsible for inducing 
immune response. The small envelope protein is 75 
residues long. Fig 2 shows the antigenic determinant 
plot; x-axis shows the residues number and y-axis 
shows average antigenic propensity The average 
antigenic propensity for this protein is 1.1191. There 
are one antigenic determinant in the sequence. The 
highest peak is at the position 10-61 residue and the 
sequence 1s 


“GTLIVNSVLLFLAFVVFLLVTLAILTALRLCAY 
CCNIVNVSLVKPTVYVYSR”. 
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FIG 2. PREDICTION OF ANTIGENICITY IN SMALL ENVELOPE PROTEIN 
BY USING KOLASKAR AND TONGAONKAR METHOD 


The average antigenic propensity is above 1.0; all 
residues above 1.0 1s potentially antigenic. Highest 
peak at the plot indicate the antigenic site for 
attachment. 


Surface Accessibility Prediction 


Emini Surface Accessibility Prediction 
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FIG 3. SHOWING THE SURFACE ACCESSIBLE REGION OF THE 
ANTIGENIC DETERMINANT. 


The sequence with seven amino acids, a hexa peptide 
whose Sn is greater than 1.0 indicates the probability 
of it being present on the surface ! 


Antigenic Antigenic Max score SAA(in %) 
site (sl no.) Score position 


1.262 44.23 


TABLE 1. TABLE SHOWING THE ANTIGENIC SCORE ALONG WITH 
SURFACE ACCESSIBILITY (SAA) OF ANTIGENIC SITE. 


Solvent accessible region 
Solvent accessible region was calculated by using 
NetsurfP server. 


Class Amin | Residu | RS ASA 


assignmen | 4 acid | eno A 
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cells. Hence, these epitopes could be useful for 
diagnosis and as vaccine constituents.'! Research 
revealed that human monoclonal antibodies have 
been used from memory B-cells to neutralize SARS 
Coronavirus.”"! 


Hydrophilicity prediction 
Parker Hydrophilicity Prediction 
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Hydrophilicity 
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TABLE 2. TABLE SHOWING THE EXPOSED AREAOF AMINO ACIDS. 


E=Exposed, RSA=Relative surface accessibility, 
ASA=Absolute surface accessibility 


B-cell Epitope prediction 

B-cell epitope was predicted within our antigenic site 
by using neural network method to increase the 
accuracy of the prediction. Higher score means the 
higher probability to be a epitope. All the peptides 
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FIG 4 CALCULATE THE HYDROPHILICITY OF ANTIGENIC 
DETERMINANT BY USING PARKER METHOD. 


Hydrophilic scale based on peptide retention times 
during high-performance liquid chromatography 
(HPLC) on a reversed-phase column was constructed. 
A window of seven residues was used for analyzing 
epitope region. For each of the seven residues 
corresponding scale value was introduced. The 


arithmetical mean value of these seven residues was 
assigned to the fourth, (i+3), residue in_ the 
TABLE 3. SEQUENCES HAVING THE HIGHER segment.!*! 

THRESHOLD VALUE(0.5 1) FOR B-CELL PREDICTION 
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shown here are above the threshold value 0.51. 


Secondary structure prediction 


TLAILTALRLCAYCC 
N 


LCAYCCNIVNVSLVK 


P 


The region 30-36 in the antigenic site shows an FIG 5. SHOWING THE PERCENTAGE OF HELIX, BETA BRIDGE, COIL 
overlapping B-cell epitope. So, there is a high chance REGION. 


for this region to be as epitope. 

The structural analysis of the protein revealed the 
presence of maximum percentage of alpha-helix 
which makes the protein hydrophilic in nature. 


B-cell epitope prediction is useful for the 
development of antibodies against SARS infection. 
Antibodies to SARS-Coronavirus (SARS-CoV)- 
specific B cell epitopes might recognize the pathogen 
and interrupt its adherence to and penetration of host 


Designing and Energy Optimization 
The peptide was built by using SYBYL software. 
AMBER charge distribution was applied for building 


the peptide and found the molecule having the net 


charge 3.0. The initial energy of the molecule was 
838385.780kcal/mole. 


Energy minimization was done by applying AMBER 


force field, and energy was calculated by Powel 
method. After energy minimization the molecule was 
energetically minimized and showed energy of - 
485.873 kcal/mole. 


Van Der Waal Energy -170.360 kcal/mole 
1-4 Electrostatic Energy 379.065 kcal/mole 
Electrostatic energy -1040.802 kcal/mole 


TOTAL ENERGY -485.873 kcal/mole 


TABLE 4. TABLE SHOWED THE ENERGY BREAKDOWN OF THE 
ENERGY MINIMIZED STRUCTURE OF THE CANDIDATE VACCINE. 


FIG 6 DESIGNED PEPTIDE IN A_ PERIODIC BOUNDARY 
CONDITION.(USING SYBYL) 


Ramachandran plot 


The designed structure was validated by using 


Ramachandran plot. The result showed that all the 
residues fall under allowed region. 
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FIG7. SHOWING THE RESULT OF RAMACHANDRAN PLOT 


Ramachandran plot analysis 


PDB File : Vaccinestr.pdb 
Total conformations : 50 
Residue L-Helix Sheet R-Helix FAR AAR GAR Outside Total 
ALA 0 3 0 3 1 0 0 4 
ARG 0 1 0 1 0 0 0 1 
ASH 0 3 0 3 0 0 0 3 
cys 0 3 0 3 0 0 0 3 
ILE 0 3 0 3 0 0 0 3 
LEU 0 11 0 11 0 0 0 11 
LYS 0 1 0 1 0 0 0 1 
PHE 0 3 0 3 0 0 0 3 
PRO 0 0 1 1 0 0 0 1 
SER 0 3 0 3 0 0 0 3 
THR 0 3 0 3 1 0 0 4 
TYR 0 3 0 3 0 0 0 3 
VAL 0 10 0 10 0 0 0 10 
Total 0 AT 1 48 2 0 0 50 
Fully Allowed Region ( 48 residues) : 96.00 % 
Additionally Allowed Region ( 2 residues) : 4.00 % 

100.00 % 
Alpha helical Region ( 1 residues) : 2.08 % 


Beta sheet Region ( 47 residues) : 97.92 % 
3-10 helical Region ( 0 residues) : 0.00 % 


FIG 8. SHOWING THE ANALYSIS OF RAMACHANDRAN PLOT. 


The result showed that, 96% residues (48 residues) 
fallen under allowed region (FAR) and 4% ( 2 
residues) belong to additionally allowed 
region(AAR). No residues present in the disallowed 
region. 


IV. CONCLUSION 


Vaccine design is suited to the application of in-silico 
techniques, for the development of new and existing 
vaccines. The identification of effective vaccine 
depends on the identification of the composition of 
protective antigens. Proteins and polysaccharide used 
as protective antigens for various vaccine 
formulations.But, in case of polysaccharide, due to 
high level of molecular variations, made it difficult to 


effectively sustain vaccine stability. Hence, in this 
study, major efforts were directed towards identifying 
and testing protein antigen. 


Through this work potential vaccine candidates were 
found through the screening of the proteome of 
SARS coronavirus TOR2 strain.. Small envelope 
protein play an important role in the replication of 
SARS tor2 strain. The protein is hydrophobic in 
nature and most of the amino acids were exposed on 
the surface. The potential antigenic sites were 
determined by using various immunological 
approach. B-cell epitopes were identified and 
showed that 30-36 region of the antigenic site might 
be an epitope. The predicted antigenic peptides 
having a score of above 1 are considered to be 
potentially antigenic and can elicit an immune 
response in the human body by acting as epitopes for 
the B cells. The candidate peptide having the 
minimized energy of -485.873 kcal/mole, showing 
satisfactory results. The chances that this peptide will 
induce efficient vaccination against SARS 
coronavirus are high because the small envelope 
protein, a structural and conserved protein in almost 
all SARS coronavirus strains, is directly involved in 
the replication and enveloping of the viral protein. 
Thus a vaccine targeting this protein will result in the 
blocking of the life cycle of the virus and prevent its 
spread as a pandemic. The envelope (E) protein from 
coronaviruses is a small polypeptide that contains at 
least one a-helical transmembrane domain. Absence, 
or inactivation, of E protein results in attenuated 
viruses, due to alterations in either virion morphology 
or tropism. Current work emphasise that E-protein 
plays a important and multifunctional role in 
coronavirus virion life cycle. Spike protein S 
involved in viral fusion with host cells, the envelope 
protein E, the membrane protein M and _ the 
nucleocapsid protein N that binds to the RNA 
genome as well as several additional open reading 
frames and involved in viral replication. 
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